179 research outputs found

    PERCEIVE: Precipitation Data Characterization by means on Frequent Spatio-Temporal Sequences

    Get PDF
    Nowadays large amounts of climatology data, including daily precipitation data, are collected by means of sensors located in different locations of the world. The data driven analysis of these large data sets by means of scalable machine learning and data mining techniques allows extracting interesting knowledge from data, inferring interesting patterns and correlations among sets of spatio-temporal events and characterizing them. In this paper, we describe the PERCEIVE framework. PERCEIVE is a data-driven framework based on frequent spatio-temporal sequences and aims at extracting frequent correlations among spatio-temporal precipitation events. It is implemented by using R and Apache Spark, for scalability reasons, and provides also a visualization module that can be used to intuitively show the extracted patterns. A preliminary set of experiments show the efficiency and the effectiveness of PERCEIVE

    BAC: A bagged associative classifier for big data frameworks

    Get PDF
    Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

    Scaling associative classification for very large datasets

    Get PDF
    Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers

    Extractive Conversation Summarization Driven by Textual Entailment Prediction

    Get PDF
    Summarizing conversations like meetings, email threads or discussion forums poses relevant challenges on how to model the dialogue structure. Existing approaches mainly focus on premise-claim entailment relationships while neglecting contrasting or uncertain assertions. Furthermore, existing techniques are abstractive, thus requiring a training set consisting of humanly generated summaries. With the twofold aim of enriching the dialogue representation and addressing conversation summarization in the absence of training data, we present an extractive conversation summarization pipeline. We explore the use of contradictions and neutral premise-claim relations, both in the same document or in different documents. The results achieved on four datasets covering different domains show that applying unsupervised methods on top of a refined premise-claim selection achieves competitive performance in most domains

    Leveraging Explainable AI to Support Cryptocurrency Investors

    Get PDF
    In the last decade, cryptocurrency trading has attracted the attention of private and professional traders and investors. To forecast the financial markets, algorithmic trading systems based on Artificial Intelligence (AI) models are becoming more and more established. However, they suffer from the lack of transparency, thus hindering domain experts from directly monitoring the fundamentals behind market movements. This is particularly critical for cryptocurrency investors, because the study of the main factors influencing cryptocurrency prices, including the characteristics of the blockchain infrastructure, is crucial for driving experts’ decisions. This paper proposes a new visual analytics tool to support domain experts in the explanation of AI-based cryptocurrency trading systems. To describe the rationale behind AI models, it exploits an established method, namely SHapley Additive exPlanations, which allows experts to identify the most discriminating features and provides them with an interactive and easy-to-use graphical interface. The simulations carried out on 21 cryptocurrencies over a 8-year period demonstrate the usability of the proposed tool

    A Data-Driven Based Dynamic Rebalancing Methodology for Bike Sharing Systems

    Get PDF
    Mobility in cities is a fundamental asset and opens several problems in decision making and the creation of new services for citizens. In the last years, transportation sharing systems have been continuously growing. Among these, bike sharing systems became commonly adopted. There exist two different categories of bike sharing systems: station-based systems and free-floating services. In this paper, we concentrate our analyses on station-based systems. Such systems require periodic rebalancing operations to guarantee good quality of service and system usability by moving bicycles from full stations to empty stations. In particular, in this paper, we propose a dynamic bicycle rebalancing methodology based on frequent pattern mining and its implementation. The extracted patterns represent frequent unbalanced situations among nearby stations. They are used to predict upcoming critical statuses and plan the most effective rebalancing operations using an entirely data-driven approach. Experiments performed on real data of the Barcelona bike sharing system show the effectiveness of the proposed approach

    Mining SpatioTemporally Invariant Patterns

    Get PDF

    Double-Step U-Net: A Deep Learning-Based Approach for the Estimation of Wildfire Damage Severity through Sentinel-2 Satellite Data

    Get PDF
    Wildfire damage severity census is a crucial activity for estimating monetary losses and for planning a prompt restoration of the affected areas. It consists in assigning, after a wildfire, a numerical damage/severity level, between 0 and 4, to each sub-area of the hit area. While burned area identification has been automatized by means of machine learning algorithms, the wildfire damage severity census operation is usually still performed manually and requires a significant effort of domain experts through the analysis of imagery and, sometimes, on-site missions. In this paper, we propose a novel supervised learning approach for the automatic estimation of the damage/severity level of the hit areas after the wildfire extinction. Specifically, the proposed approach, leveraging on the combination of a classification algorithm and a regression one, predicts the damage/severity level of the sub-areas of the area under analysis by processing a single post-fire satellite acquisition. Our approach has been validated in five different European countries and on 21 wildfires. It has proved to be robust for the application in several geographical contexts presenting similar geological aspects

    Density-based Clustering by Means of Bridge Point Identification

    Get PDF
    Density-based clustering focuses on defining clusters consisting of contiguous regions characterized by similar densities of points. Traditional approaches identify core points first, whereas more recent ones initially identify the cluster borders and then propagate cluster labels within the delimited regions. Both strategies encounter issues in presence of multi-density regions or when clusters are characterized by noisy borders. To overcome the above issues, we present a new clustering algorithm that relies on the concept of bridge point. A bridge point is a point whose neighborhood includes points of different clusters. The key idea is to use bridge points, rather than border points, to partition points into clusters. We have proved that a correct bridge point identification yields a cluster separation consistent with the expectation. To correctly identify bridge points in absence of a priori cluster information we leverage an established unsupervised outlier detection algorithm. Specifically, we empirically show that, in most cases, the detected outliers are actually a superset of the bridge point set. Therefore, to define clusters we spread cluster labels like a wildfire until an outlier, acting as a candidate bridge point, is reached. The proposed algorithm performs statistically better than state-of-the-art methods on a large set of benchmark datasets and is particularly robust to the presence of intra-cluster multiple densities and noisy borders

    CaBuAr: California burned areas dataset for delineation [Software and Data Sets]

    Get PDF
    Forest wildfires represent one of the catastrophic events that, over the last decades, have caused huge environmental and humanitarian damage. In addition to a significant amount of carbon dioxide emission, they are a source of risk to society in both short-term (e.g., temporary city evacuation due to fire) and long-term (e.g., higher risks of landslides) cases. Consequently, the availability of tools to support local authorities in automatically identifying burned areas plays an important role in the continuous monitoring requirement to alleviate the aftereffects of such catastrophic events. The great availability of satellite acquisitions coupled with computer vision techniques represents an important step in developing such tools
    • …
    corecore